A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation
نویسندگان
چکیده
In contrast to the conventional minimum mean squared error (MMSE) training criterion for nonlinear spectral mapping based on deep neural networks (DNNs), we propose a probabilistic learning framework to estimate the DNN parameters for singlechannel speech separation. A statistical analysis of the prediction error vector at the DNN output reveals that it follows a unimodal density for each log power spectral component. By characterizing the prediction error vector as a multivariate Gaussian density with zero mean vector and an unknown covariance matrix, we present a maximum likelihood (ML) approach to DNN parameter learning. Our experiments on the Speech Separation Challenge (SSC) corpus show that the proposed learning approach can achieve a better generalization capability and a faster convergence than MMSE-based DNN learning. Furthermore, we demonstrate that the ML-trained DNN consistently outperforms MMSE-trained DNN in all the objective measures of speech quality and intelligibility in single-channel speech separation.
منابع مشابه
شبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملDeep Recurrent Neural Network Based Monaural Speech Separation Using Recurrent Temporal Restricted Boltzmann Machines
This paper presents a single-channel speech separation method implemented with a deep recurrent neural network (DRNN) using recurrent temporal restricted Boltzmann machines (RTRBM). Although deep neural network (DNN) based speech separation (denoising task) methods perform quite well compared to the conventional statistical model based speech enhancement techniques, in DNN-based methods, the te...
متن کاملFeature mapping using far-field microphones for distant speech recognition
Acoustic modeling based on deep architectures has recently gained remarkable success, with substantial improvement of speech recognition accuracy in several automatic speech recognition (ASR) tasks. For distant speech recognition, the multi-channel deep neural network based approaches rely on the powerful modeling capability of deep neural network (DNN) to learn suitable representation of dista...
متن کاملAn investigation of spectral restoration algorithms for deep neural networks based noise robust speech recognition
Deep Neural Networks (DNNs) are becoming widely accepted in automatic speech recognition (ASR) systems. The deep structured nonlinear processing greatly improves the model’s generalization capability, but the performance under adverse environments is still unsatisfactory. In the literature, there have been many techniques successfully developed to improve Gaussian mixture models’ robustness. In...
متن کاملAn efficient method for cloud detection based on the feature-level fusion of Landsat-8 OLI spectral bands in deep convolutional neural network
Cloud segmentation is a critical pre-processing step for any multi-spectral satellite image application. In particular, disaster-related applications e.g., flood monitoring or rapid damage mapping, which are highly time and data-critical, require methods that produce accurate cloud masks in a short time while being able to adapt to large variations in the target domain (induced by atmospheric c...
متن کامل